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Summary. The origin of severe acute respiratory syndrome-associated corona- 
virus (SARS-CoV) is still a matter of speculation, although more than one 
year has passed since the onset of the SARS outbreak. In this study, we 1m- 
plemented a 3-step strategy to test the intriguing hypothesis that SARS-CoV 
might have been derived from a recombinant virus. First, we blasted the whole 
SARS-CoV genome against a virus database to search viruses of interest. Second, 
we employed 7 recombination detection techniques well documented in success- 
fully detecting recombination events to explore the presence of recombination 
in SARS-CoV genome. Finally, we conducted phylogenetic analyses to further 
explore whether recombination has indeed occurred in the course of coronaviruses 
history predating the emergence of SARS-CoV. Surprisingly, we found that 7 
putative recombination regions, located in Replicase lab and Spike protein, ex- 
ist between SARS-CoV and other 6 coronaviruses: porcine epidemic diarrhea 
virus (PEDV), transmissible gastroenteritis virus (TGEV), bovine coronavirus 
(BCoV), human coronavirus 229E (HCoV), murine hepatitis virus (MHV), and 
avian infectious bronchitis virus (IBV). Thus, our analyses substantiate the pres- 
ence of recombination events in history that led to the SARS-CoV genome. 
Like the other coronaviruses used in the analysis, SARS-CoV is also a mosaic 
Structure. 


Introduction 


SARS, a new disease characterized by high fever, malaise, rigor, headache and 
non-productive cough, has spread to over 30 countries with around 8% of mor- 
tality rate on average. Sequence analysis of SARS coronavirus (SARS-CoV) 
[17, 25] showed that it 1s a novel coronavirus [12]. Anand et al. [1] reported 
a three-dimensional model of SARS-CoV main proteinase and suggested that 
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modified rhinovirus 3CP" inhibitors could be useful for SARS therapy. Lipsitch 
et al. [15] developed a mathematical model of SARS transmission to estimate 
the infectiousness of SARS and the likelihood of an outbreak. Ng et al. [22] 
suggested that SARS-CoV could have been derived from an innocuous virus or 
oen causing a mild disease, that would become virulent after some mutational 
event occuring 1n some carriers. However, the source of SARS-CoV is not yet 
exactly known, although it has been reported that a virus highly related to SARS- 
CoV has infected some wild animals, such as masked palm civet, raccoon dog 
and badger [7]. 

Recombination, a key evolutionary process, accounts for a considerable 
amount of genetic diversity in natural populations. The occurrence of high- 
frequency homologous RNA recombination is one of the most intriguing aspects 
of coronavirus replication [14, 27, 31, 34]. The first experimental evidence for 
IBV recombination was found by Kottier et al. [11], although other studies have 
concluded that recombination is a feature of IBV evolution [4, 5, 10, 36-38]. 
Recombination in MHV was also experimentally demonstrated [16]. In partic- 
ular, Snijder et al. [30] indicated that the recombination occurred between a 
coronavirus/torovirus-like virus and an influenza C-like virus, resulting in a line 
of coronaviruses that had a haemagglutinin esterase (HE) gene. This prompted 
us to explore the possible role of recombination in the emergence of SARS-CoV. 
A recent report indicated that SARS-CoV has been found in a number of wild 
animals with 99.8% identity [7]. What would be the role of recombination in the 
event that created this virus, possibly in a predator animal? 

Stavrinides and Guttman [32] have suggested that a possible past recombina- 
tion event between mammalian-like and avian-like parent viruses is responsible 
for the evolution of SARS-CoV. In order to further test for the recombination 
hypothesis, we implemented a 3-step strategy. First, we employed BLAST to 
determine which viruses (coronaviruses or other viruses) should be included in 
the sample relevant for recombination detection analysis. Second, we used widely 
used recombination detection techniques to detect the occurrence of recombina- 
tion between SARS-CoV and other coronaviruses. Finally, we used phylogenetic 
tree analysis to confirm the presence of recombination events. 


Materials and methods 


Sequences 


A reference SARS-CoV genome sequence (NC_004718) [17] was downloaded from 
GenBank. In order to determine which viruses (coronaviruses or other viruses) should be 
included in the sample relevant for recombination detection analysis, we blasted the whole 
SARS-CoV sequence against virus database and the result indicated that there are 6 sig- 
nificant hits (at the level of E-value <0.0001. Table 1): Murine hepatitis virus (MHV), 
Porcine epidemic diarrhea virus (PEDV), Bovine coronavirus (BCoV), Transmissible gas- 
troenteritis virus (TGEV), Avian infectious bronchitis virus IBV) and Human coronavirus 
229E (HCoV). All these sequences were downloaded from GenBank: MHV (AF029248), 
PEDV (AF353511), BCoV (NC_003045), TGEV (NC_002306), IB V (NC_001451) and HCoV 
(NC_002645). 
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Table 1. Search results by BLAST 


Virus Score (bits) E-value 
Murine hepatitis virus 92 2.00E-16 
Porcine epidemic diarrhea virus 80 8.00E-13 
Bovine coronavirus V2 2.00E-10 
Transmissible gastroenteritis virus 58 3.00E-06 
Avian infectious bronchitis virus 58 3.00E-06 
Human coronavirus 229E 54 4.00E-05 
Ovine astrovirus 48 0.003 
Streptococcus pyogenes 44 0.043 
Saccharomyces cerevisiae chromosome 42 0.17 
Saccharomyces cerevisiae chromosome 40 0.67 
Equine rhinitis B virus A0 0.67 
Equine rhinovirus 3 A0 0.67 
Callitrichine herpesvirus 3 A0 0.67 
Turkey astrovirus A0 0.67 
Amsacta moorei entomopoxvirus 40 0.67 
Salmonella typhimurium bacteriophage 38 241 
Goatpox virus 38 24 
Bacteriophage SPBc2 38 2.1 
Saccharomyces cerevisiae chromosome 38 pee 
Shrimp white spot syndrome virus 38 | 
Tupaia paramyxovirus 38 2h 
Rachiplusia ou multiple nucleohedrovirus 38 | 
Lumpy skin disease virus 38 2) 
Sheeppox virus 38 pee | 
Human papillomavirus type 59 38 2] 
Citrus tristeza virus 38 Zed 
Pseudomonas phage phiKZ 38 24 


Recombination detection and phylogenetic analysis 


There are a number of methods and software packages that have been developed for detection 
of recombination events in DNA sequences. The performance of these methods has been 
extensively evaluated and compared on simulated and real data [23, 24]. In the present study 
we applied these methods to RNA viruses. SARS-CoV and other 6 coronavirus genomes 
(SARS-CoV, IBV, BCoV, HCoV, MHV, PEDV, TGEV) were first aligned using CLUSTALW 
[33]. Sites with gaps were removed and a 25077-nt alignment was generated. Subsequently, 
seven methods were employed to detect the occurrence of recombination (see corresponding 
reference in parenthesis for details of each method): BOOTSCAN [26], GENECONV [28], 
DSS (Difference of Sums of Squares) [20], HMM (Hidden Markov Model) [8], MAXCHI 
(Maximum Chi-Square method) [19], PDM (Probabilistic Divergence Measures) [9], RDP 
(Recombination Detection Program) [18]. 

BOOTSCAN, MAXCHI and RDP are implemented in RDP software package, 
http://web.uct.ac.za/depts/microbiology/microdescription.htm. GENECONV is implemented 
in the program, http://www.math.wustl.edu/~sawyer/geneconv/. DSS, HMM and PDM are 
implemented in TOPAL1 software package, http://www.bioss.sari.ac.uk/software.html. 
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Basically default parameter settings were used in all the programs, except the following values: 
gscale = 1 (GENECONYV), internal and external references (RDP), window size = 300 and 
step = 10 (DSS, HMM and PDM). 

After potential recombination events were identified by at least 3 methods above, separate 
neighbor joining trees were constructed for each putative recombination region to better 
evaluate the evidence for conflicting evolutionary histories of different sequence regions. All 
trees were produced with TOPAL1 mentioned above. 


Results 
Recombination detection 


Table 2 summarizes the results of BOOTSCAN analysis with 100% bootstrap 
Support and significant P-value (<0.05 for uncorrected and MC corrected P- 
value). Two regions (13151-13299 and 16051-16449, position in alignment) are 
identified as putative recombination regions and all 6 coronaviruses are potential 
parents with SARS-CoV as potential daughter. 

GENECONV detected 9 putative recombination events occurred in a wide 
range of positions 5941—24997 (in alignment) at a significant level p < 0.05 for 
two P-values: simulated P-value (based on 10,000 permutations) and BLAST- 
like BC KA P-value (Table 3). All 6 coronaviruses are potential parents with 
SARS-CoV as potential daughter. 

MAXCHI identified 15 putative recombination events (Table 4, possible 
misidentification events are not retained). Most of the breakpoints are signif- 
icant at about 0.001 level; the position located in alignment spans from 3534 
to 22840, but some beginning or ending breakpoints are not determined. Sim- 
ilarly, 6 coronaviruses are potential parents with SARS-CoV as _ potential 
daughter. 

RDP revealed that 6 putative recombination events occur in the domain of 
alignment 5910-13334 (Table 5), with the uncorrected and MC corrected p- 
value at less than 0.002 and 0.05 respectively. In this case, 4 coronaviruses 
(IBV, BCoV, MHV and PEDV) are potential parents with SARS-CoV as potential 
daughter. 

Figure | shows the DSS profiles of putative breakpoints between SARS-CoV 
and other coronaviruses (Dotted line indicates the 95 percentile under the null 
hypothesis of no recombination): SARS-CoV, IBV, BCoV and MHV (Fig. 1a), 
SARS-CoV, MHV, PEDV and TGEV (Fig. 1b), SARS-CoV, IBV, HCoV and 
TGEV (Fig. Ic). There are about 6 different breakpoints (significant peaks): 
13614 and 16085 (Fig. 1a), 11008 and 12850 (Fig. 1b), 12805, 13614 and 16444 
(Fig. Ic). 

HMM plots for SARS-CoV, IBV, BCoV and HCoV (Fig. 2) revealed that 
the putative breakpoints are at about position 5500 and 19000. There is a clear 
transition from state 1 (SARS-CoV grouped with IBV) (Fig. 2a) into state 3 
(SARS-CoV grouped with HCoV) (Fig. 2c). The region between 5500 and 19000 
is noisy, and at this moment no information can be provided by HMM. 

Figure 3 shows the results of PDM analysis performed on SARS-CoV and 
other coronaviruses (dotted line indicates the 95% critical region for the null 
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Table 3. Recombination regions identified by GENECONV method 


Identified by: Daughter Parent Beginningin Endingin Simulated BCKA 
alignment alignment P-Value P-Value 
GENECONV SARS IBV 24970 24997 0.0001 0.00003 
GENECONV- SARS IBV 20708 20727 0.0156 0.0172 
GENECONV- SARS BCoV 12102 12135 0.0329 0.04634 
GENECONV SARS BCoV 11977 12024 0.0051 0.00509 
GENECONV- SARS BCoV 5941 5965 0.0051 0.00509 
GENECONV- SARS HCoV 10491 10524 0.0033 0.00361 
GENECONV- SARS MHV 12595 12664 0.0185 0.01999 
GENECONV- SARS PEDV 13208 13263 ().0076 0.00827 
GENECONV SARS TGEV 8399 8425 0.0315 0.02951 
Table 4. Recombination regions identified by MAXCHI method 
Identified by: Daughter Major Minor Beginningin — Ending in Beginning Ending 
parent parent alignment alignment breakpoint breakpoint 
P-Value P-Value 
Maxchi SARS PEDV TGEV 9052 9066 0.028108 0.00065 
Maxchi SARS IBV HCoV undetermined 5486 — ().000336 
Maxchi SARS HCoV IBV 14026 undetermined 0.000913 — 
Maxchi SARS PEDV TGEV 10668 undetermined 0.000957 _ 
Maxchi SARS Unknown IBV 20676 22840 0.000913 0.000913 
(MHV) 
Maxchi SARS Unknown IBV undetermined 8996 — 0.000957 
(MHV) 
Maxchi SARS MHV BCoV 16609 undetermined 0.000913 — 
Maxchi SARS MHV BCoV 20514 undetermined 7.75E-06 — 
Maxchi SARS MHV HCoV undetermined 3534 — ().000336 
Maxchi SARS PEDV HCoV 18528 undetermined 0.001015 — 
Maxchi SARS PEDV HCoV undetermined 7281 — 0.00065 
Maxchi SARS PEDV HCoV 15742 15763 0.001015 0.009907 
Maxchi SARS HCoV PEDV 9137 9156 0.000913 0.010587 
Maxchi SARS PEDV HCoV 5474 undetermined 0.000957 — 
Maxchi SARS HCoV TGEV 12854 undetermined 0.000253 — 
Table 5. Recombination regions identified by RDP method 
Identified by: Daughter Major Minor Beginningin Endingin Uncorrected MC corrected 
parent parent alignment alignment P-Value P-value 
RDP SARS IBV BCoV 5910 6111 5.18E-04 1.81E-02 
RDP SARS IBV BCoV 6136 6286 1.56E-05 5.45E-04 
RDP SARS IBV MHV 6134 6326 1.28E-03 4.49E-02 
RDP SARS BCoV PEDV = 13151 13280 3.32E-04 1.16E-02 
RDP SARS MHV PEDV- 9196 9334 1.72E-05 6.03E-04 
RDP SARS MHV- PEDV 13152 13334 3.89E-05 1.36E-03 
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Fig. 1. Predicting recombination regions with DSS (Difference of Sums of Squares) 
implemented in TOPAL1i. Default parameter values were used except for the Fitch method, 
where a window size = 300 and step = 10 were chosen. The horizontal axis represents the 
site in the alignment, the vertical axis represents the DSS statistic, and the dotted line shows 
the 95 percentile under the null hypothesis of no recombination. SARS-CoV, IBV, BCoV and 
MHV for Fig. la, SARS-CoV, MHV, PEDV and TGEV for Fig. 1b, and SARS-CoV, IBV, 
HcoV and TGEV for Fig. lc, where SARS-CoV-severe acute respiratory syndrome-associated 
coronavirus, PEDV-porcine epidemic diarrhea virus, TGEV-transmissible gastroenteritis 
virus, BCoV-bovine coronavirus, HCoV-human coronavirus, MHV-murine hepatitis virus, 
and IB V-avian infectious bronchitis virus 


hypothesis of no recombination): SARS-CoV, IBV, BCoV and MHV (Fig. 3a, b), 
SARS-CoV, MHV, PEDV and TGEV (Fig. 3c, d), SARS-CoV, BCoV, HCoV 
and MHV (Fig. 3e, f). A number of breakpoints (pronounced peaks) could be 
concurred: 6380, 13479, 18915 and 20263 (Fig. 3a, b), 1753, 5032, 9256, 10289, 
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Fig. 2. Predicting recombination regions with HMM (Hidden Markov Model) implemented 

in TOPALi. Default parameter values were used. The horizontal axis represents the site 

in the alignment, the vertical axis represents the probability for topology change, and 

the dotted line shows the 95 percentile under the null hypothesis of no recombination. 

SARS-CoV, IBV, BCoV and HCoV was used, where SARS-CoV-severe acute respiratory 

syndrome-associated coronavirus, BCoV-bovine coronavirus, HCoV-human coronavirus, and 
IB V-avian infectious bronchitis virus 


15591, 19050 and 22195 (Fig. 3c, d), 1393, 6111, 16624, 19859 and 20802 
(Fig. 3e, f). 

Posada [23] suggested that one should not rely too much on a single method 
for recombination detection. Here we consider the regions identified by at least 3 
methods as putative recombination regions. The results are summarized in Table 6. 
Seven putative recombination regions span a range of positions in SARS-CoV 
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Fig. 3 (continued) 
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Fig. 3. Predicting recombination regions with PDM (Probabilistic Divergence Measures) 
implemented in TOPALi. Default parameter values were used with the exception that window 
size = 300 and step = 10 were used. The horizontal axis represents the site in the alignment, 
the vertical axis represents the global and local divergence measures, and the dotted line 
shows the 95% critical region for the null hypothesis of no recombination. SARS-CoV, 
IBV, BCoV and MHV for Fig. 3a, b, SARS-CoV, MHV, PEDV and TGEV for Fig. 3c, d, 
and SARS-CoV, BCoV, HcoV and MHV for Fig. 3e, f, where SARS-CoV-severe acute 
respiratory syndrome-associated coronavirus, PEDV-porcine epidemic diarrhea virus, TGEV- 
transmissible gastroenteritis virus, BCoV-bovine coronavirus, HCoV-human coronavirus, 
MHV-murine hepatitis virus, and [B V-avian infectious bronchitis virus 


genome from 7475-24133. These regions are separately extracted for phyloge- 
netic analysis. 


Phylogenetic analysis 


Phylogenetic trees constructed by using putative recombination regions and non- 
recombination regions identified by above techniques are shown in Figure 4. 
The left panels stand for non-recombination regions and the right panels for 
recombination regions. We compared each row of figures and found that the 
phylogenetic tree in the left panel (non-recombination region) had very different 
topology when compared to the phylogenetic tree in the right panel (recombination 
region), which indicates that recombination has occurred. For example, in Fig. 4a, 
7 coronaviruses are divided into 4 groups: group | for TGEV, HCoV and PEDV, 
group 2 for BCoV and MHYV, group 3 for IBV, and group 4 for SARS-CoV, 
consistent with Marra et al. [17]; while in Fig. 4b, 7 coronaviruses are divided 
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Fig. 4 (continued) 
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Fig. 4. Phylogenetic analysis of putative recombination regions. Neighbour joining trees were 
constructed by TOPAL1i. The sequence region in the alignment used for each tree is written 
below each figure. The phylogenetic trees in the left panel correspond to non-recombination 
region and the phylogenetic trees in the right panel correspond to recombination region. 
All branch lengths are drawn to scale. Six coronaviruses (IBV, BCoV, HCoV, MHV, 
PEDV and TGEV) are potential parents of SARS-CoV, where SARS-CoV-severe acute 
respiratory syndrome-associated coronavirus, PEDV-porcine epidemic diarrhea virus, TGEV- 
transmissible gastroenteritis virus, BCoV-bovine coronavirus, HCoV-human coronavirus, 
MHV-murine hepatitis virus, and IBV-avian infectious bronchitis virus 


into 2 groups: group | for IBV, TGEV, HCoV and PEDV, group 2 for BCoV, MHV 
and SARS-CoV, suggests that SARS-CoV is most closely related to BCoV and 
MHYV, which is consistent with a recent report [29]. At the same time, SARS-CoV 
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is also most closely related to TGEV (Fig. 4d) and IBV (Fig. 4f). Thus, phylo- 
genetic analysis substantiates the presence of recombination events in the history 
that led to the SARS-CoV genome. 


Discussion 


In this study, seven recombination detection methods and phylogenetic analyses 
were performed on SARS-CoV and the six coronaviruses identified by BLAST 
(IBV, BCoV, HCoV, MHV, PEDV and TGEV). These techniques successfully 
identified recombination events in bacteria and viruses [2, 3, 6, 21, 26, 39]. Our 
analysis concurred to suggest the occurrence of recombination events between 
ancestors of SARS-CoV and these 6 coronaviruses. Indeed, pairwise alignment 
showed that many segments of high homology with IBV, BCoV, HCoV, MHV, 
PEDV and TGEV do exist in SARS-CoV genome, Table 7 exhibits the segments 
with length >20 nt and identiy >80%, and Fig. 5 shows the mosaic structure of the 
region 14930-15908 in SARS-CoV genome based on the segments with length 
>50 and identity >80%. Of course, the other coronaviruses used in the analysis 
are also mosaic structures, for more sequence similarities exist among them than 
with SARS-CoV. 

It is noted that all the sequence comparisons in this study are based on 
nucleotide sequences. While the protein sequences in SARS-CoV are largely 
different from those in the known three groups of coronavirus [17], such as, for S 
protein, the identity 1s: 25.9% for SARS-CoV and BCoV, 21.7% for SARS-CoV 
and HCoV, 21.5% for SARS-CoV and IBV, 25.6% for SARS-CoV and MHV, 
20.6% for SARS-CoV and PEDV, 19.4% for SARS-CoV and TGEV. Although 
SARS-CoV is close to BCoV, MHV, TGEV and IBV, the corresponding protein, 
replicase la, is still different: with identity 27.4% for SARS-CoV and BCoV, 
24.8% for SARS-CoV and IBV, 32.2% for SARS-CoV and MHV, 25.0% for 
SARS-CoV and TGEV. 

Naturally, we should take into account the role of convergent evolution, which 
would bear its mark on the viral genome. The recombination events that we 
witnessed in SARS-CoV are present in six different viruses, suggesting sequential 
horizontal transfers and progressive adaptation to new hosts cells or animals. 
Indeed because viruses need both receptors to permeate host cells and resist 
the immune response of the host, their outer layer proteins are submitted to an 
extremely strong selection pressure that may restrict considerably the possible 
variations of the corresponding proteins (and accordingly of the corresponding 
genome pieces of sequences). It is nevertheless remarkable that, despite the 
inclusion of all possible types of viruses in our sample set (as well as shuffled 
genomes from the viruses we have identified as relevant) we find a more or less 
single category of viruses as similar to SARS-CoV. This suggests that even if 
the contribution of convergent evolution is important, this happened on a more or 
less common phylogenetic background, suggesting several steps of recombination 
followed by fine adaptation. In this context, we would like to suggest that ancestors 
of PEDV, MHV or both are the most plausible origin of SARS-CoV. Guan et al. [7] 
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Table 7. Mosaic segments in SARS-CoV genome (length >20 nt and identity >80%) 


Beginning in 


SARS 


10063 
10609 
12821 
13844 
13845 
13986 
14365 
14367 
14490 
14589 
14685 
14724 
14808 
14913 
14933 
14982 
14986 
15062 
15123 
15210 
15210 
15210 
15417 
15417 
15420 
15611 
15624 
15633 
15729 
15765 
15852 
17088 
17688 
L7757 
17783 
18558 
1877] 
18784 
19102 
19113 
19146 
19201 
19206 
19396 


Ending in 


SARS 


10109 
10641 
12854 
13879 
13879 
14011 
14412 
14395 
14523 
14632 
14729 
14746 
14835 
14947 
15070 
15091 
15055 
15093 
15173 
15232 
15238 
15253 
15482 
15457 
15479 
15682 
15670 
15672 
15770 
15817 
15908 
17125 
17714 
17800 
17809 
18577 
18847 
18833 
19132 
19132 
19252 
19252 
19253 
19420 


Length 


Identity 


Match percent 
(%) 


Source 


MHV 
TGEV 
HCoV 
BCoV 
MHV 
PEDV 
BCoV 
MHV 
BCoV 
BCoV 
MHV 
IBV 
HCoV 
HCoV 
BCoV 
IBV 
MHV 
HCoV 
TGEV 
PEDV 
BCoV 
IBV 
BCoV 
IBV 
MHV 
PEDV 
HCoV 
TGEV 
MHV 
HCoV 
MHV 
IBV 
TGEV 
PEDV 
HCoV 
PEDV 
TGEV 
HCoV 
IBV 
HCoV 
MHV 
IBV 
BCoV 
MHV 


(continued) 
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Table 7 (continued) 


Beginning in Ending in Length Identity Match percent Source 
SARS SARS (%) 
19396 19420 25 24 96 BCoV 
19517 19564 48 42 88 MHV 
19548 19588 4] 37 91 TGEV 
20709 20746 38 34 90 MHV 
20712 20747 36 33 92 IBV 
20793 20839 47 4] 88 HCoV 
20797 20827 31 28 91 PEDV 
25062 25084 23 22 96 IBV 
25068 25088 21 21 100 MHV 
25068 25090 23 22 96 TGEV 
25068 25090 23 22 96 BCoV 
29593 29621 29 29 100 IBV 
14930 15910 
BCoV BCoV 
IBV 
HV MHV MHV 
TGEV 
PEDV 
HCoV 


Fig. 5. Mosaic structure of the region 14930-15908 in SARS-CoV genome. Six corona- 

viruses (IBV, BCoV, HCoV, MHV, PEDV and TGEV) are potential parents of SARS-CoV, 

where SARS-CoV-severe acute respiratory syndrome-associated coronavirus, PEDV- 

porcine epidemic diarrhea virus, TGEV-transmissible gastroenteritis virus, BCoV-bovine 

coronavirus, HCoV-human coronavirus, MH\V-murine hepatitis virus, and IBV-avian in- 
fectious bronchitis virus 


indicated that there are 38 nucleotide polymorphisms (26 of them are non- 
synonymous) in the S genes of human SARS-CoV viruses compared to animal 
SARS-CoV-like viruses, although the additional 29 nucleotide sequence in the 
animal viruses exists in ORF10, not in the S protein. These polymorphisms could 
be responsible for changes in host range and tissue tropism among coronaviruses, 
for a single nucleotide change can dramatically alter the behaviour of the virus [35]. 

Based on phylogenetic techniques and BOOTSCAN recombination analysis 
Stavrinides and Guttman [32] indicated that the replicase of SARS-CoV was 
a mammalian-like origin, the M and N proteins have an avian-like origin, and 
the S protein has a mammalian-avian mosaic origin. While in the present study 
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we used phylogenetic analysis and 7 recombination detection methods, includ- 
ing the powerful methods of MAXCHI and GENECONV among 14 methods 
studied (SIMPLOT (BOOTSCAN), GENECONV, HOMOPLASY TEST, PIST, 
MAXCHI, CHIMAERA, PHYPRO, PLATO, RDP, RECPARS, RETICULATE, 
RUNS TEST, SNEATH TEST, TRIPLE) [23, 24], to conduct whole genome- 
wide recombination analysis. We identified seven putative recombination re- 
gions, which encompass, in terms of proteins involved, replicase 1A, replicase 
1B and the spike glycoprotein. Stavrinides and Guttman [32] primarily inferred 
the occurrence of recombination qualitatively, but did not identify the precise 
recombination region in the protein involved (the S protein is an exception, they 
identified a recombination region in S protein, located between nucleotides 2472 
and 2694 of the S protein, i.e. between nucleotides 23963 and 24185 of the SARS- 
CoV genome, basically covered by the last recombination region for S protein 
(Table 6)). Most importantly, each of our recombination regions is identified by 
at least 3 methods, because one should not rely too much on a single method, as 
suggested in [23]. In general, we believe two studies lead to the overall conclusion: 
the evolution of SARS-CoV has involved recombination. 

The recombination event in the replicase is related to the fact that the RNA 
polymerase of coronaviruses utilize a discontinuous transcription mechanism to 
synthesize mRNAs. The viral polymerase must jump between different RNA 
templates regularly during positive- or negative-strand RNA synthesis and de- 
pending on the rejoining sites, the resultant RNA recombination will be either 
homologous or nonhomologous. This is the copy-choice model of recombination 
in RNA viruses [13, 27, 31, 34]. The recombination event in S protein is certainly 
important since this allows the virus to alter surface antigenicity and escape 
immunesurveillance in the animals, thus adapting to a human host. 

The existence of SARS-CoV-like viruses (99.8% homology to human SARS- 
CoV) in several wild animals in a live animal market in Guangdong [7] indicated 
that interspecies transmission among the human and animal SARS-CoV-like 
viruses had occurred. The mutation analysis of sequence variations among these 
isolates will help identify the genetic signature of SARS virus strains when a 
sufficient amount of sequence data is available. 

The very fact that several species of animals are affected does not allow one to 
trace directly the origin of the virus as endemic in one of these species, but, rather, 
might be indicative that animals and men might have been contaminated by a virus 
from acommon origin, presumably located in animal food present in local markets 
in the Guangdong province. Investigating a wide variety of animal coronaviruses, 
especially in relation to rodents, birds, snakes and farm animals, would be inter- 
esting with regard to the origin of the SARS-CoV that caused disease in humans. 

Finally, a challenging question arises. What is the molecular basis of recom- 
bination in SARS-CoV? Many requirements are needed for recombination to 
occur: (1) Two coronaviruses can infect a host simultaneously and continue to 
replicate without interference with each other; (2) Sufficient nucleotide identity 
between these genomes is essential for genome-switching to occur during RNA 
replication; (3) The proteins arising from recombination must be functional; (4) 
The recombinant virus must have some selective advantage for its survival. That 
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is, the recombination that creates a successful “new” coronavirus is probably a rare 
event. So, we must stress that the potential recombination events in SARS-CoV, 
identified in the present study, are most likely “old’”’ events, which may represent 
the events that occurred thousands of years ago. Although the recent findings 
indicated that SARS-CoV did exist in a number of wild animals [7], we have not 
yet determined where these SARS-CoV-like virus strains come from. 
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